Improved Neural Bag-of-Words Model to Retrieve Out-of-Vocabulary Words in Speech Recognition
نویسندگان
چکیده
Many Proper Names (PNs) are Out-Of-Vocabulary (OOV) words for speech recognition systems used to process diachronic audio data. To enable recovery of the PNs missed by the system, relevant OOV PNs can be retrieved by exploiting the semantic context of the spoken content. In this paper, we explore the Neural Bag-of-Words (NBOW) model, proposed previously for text classification, to retrieve relevant OOV PNs. We propose a Neural Bag-of-Weighted-Words (NBOW2) model in which the input embedding layer is augmented with a context anchor layer. This layer learns to assign importance to input words and has the ability to capture (task specific) key-words in a NBOW model. With experiments on French broadcast news videos we show that the NBOW and NBOW2 models outperform earlier methods based on raw embeddings from LDA and Skip-gram. Combining NBOW with NBOW2 gives faster convergence during training.
منابع مشابه
Learning to retrieve out-of-vocabulary words in speech recognition
Many Proper Names (PNs) are Out-Of-Vocabulary (OOV) words for speech recognition systems used to process diachronic audio data. To help recovery of the PNs missed by the system, relevant OOV PNs can be retrieved out of the many OOVs by exploiting semantic context of the spoken content. In this paper, we propose two neural network models targeted to retrieve OOV PNs relevant to an audio document...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملBag-of-words input for long history representation in neural network-based language models for speech recognition
In most of previous works on neural network based language models (NNLMs), the words are represented as 1-of-N encoded feature vectors. In this paper we investigate an alternative encoding of the word history, known as bag-of-words (BOW) representation of a word sequence, and use it as an additional input feature to the NNLM. Both the feedforward neural network (FFNN) and the long short-term me...
متن کاملExploring Concept Information for Mandarin Large Vocabulary Continuous Speech Recognition
Language modeling (LM) is part and parcel of automatic speech recognition (ASR), since it can assist ASR to constrain the acoustic analysis, guide the search through multiple candidate word strings, and quantify the acceptability of the final output hypothesis given an input utterance. This paper investigates and develops language model adaptation techniques for use in ASR and its main contribu...
متن کاملA bag-of-words equivalent recurrent neural network for action recognition
The traditional bag-of-words approach has found a wide range of applications in computer vision. The standard pipeline consists of a generation of a visual vocabulary, a quantization of the features into histograms of visual words, and a classification step for which usually a support vector machine in combination with a non-linear kernel is used. Given large amounts of data, however, the model...
متن کامل